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Abstract Water density fluctuations are an important statistical mechanical observable that is 
related to many-body correlations, as well as hydrophobic hydration and interactions. Local water 
density fluctuations at a solid-water surface have also been proposed as a measure of its hydropho- 
bicity. These fluctuations can be quantified by calculating the probability, P V (N), of observing 
iV waters in a probe volume of interest v. When v is large, calculating P V (N) using molecular dy- 
namics simulations is challenging, as the probability of observing very few waters is exponentially 
small, and the standard procedure for overcoming this problem (umbrella sampling in N) leads 
to undesirable impulsive forces. Patel et at [J. Phys. Chem. B, 114, 1632 (2010)] have recently 
developed an indirect umbrella sampling (INDUS) method, that samples a coarse-grained particle 
number to obtain P V (N) in cuboidal volumes. Here, we present and demonstrate an extension of 
that approach to other basic shapes, like spheres and cylinders, as well as to collections of such 
volumes. We further describe the implementation of INDUS in the NPT ensemble and calculate 
P V (N) distributions over a broad range of pressures. Our method may be of particular interest in 
characterizing the hydrophobicity of interfaces of proteins, nanotubes and related systems. 

Keywords umbrella sampling, density fluctuations, free energy calculations, hydrophobicity 



1 Introduction 

Quantifying density fluctuations in a condensed phase is interesting from a statistical physics per- 
spective. For example, the probability P V (N) of finding TV fluid particles in a probe volume v 
contains information about many-body correlations in the fluid. Calculations of P V (N) in liquid 

* To whom correspondence should be addressed. Email: patelalO@rpi.edu or chandler@berkeley.edu or 
gardes@rpi.edu 

Amish J. Patel ■ Shekhar Garde 

Howard P. Isermann Department of Chemical & Biological Engineering, and Center for Biotechnology and 
Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180 

Patrick Varilly • David Chandler 

Department of Chemistry, University of California, Berkeley, CA 94720 



2 



water have significantly enhanced our understanding of hydrophobicity. In particular, as the hydra- 
tion of an idealized solvent-excluding hydrophobic solute is equivalent to the creation of a cavity 
with the same size and shape as that of the solute, the excess free energy, ^t ex , of solute hydration 
is -fceT logP„(0) pQ. In 1996, Hummer et at showed that in bulk water, P V (N) distributions are 
gaussian for small spherical volumes containing fewer than ten water molecules on average [5| . This 
simplicity formed the basis for an information theoretic model that could predict the thermodynam- 
ics of hydrophobic hydration and the association of small solutes over a range of conditions, using 
only the readily available information on the average density and the water radial distribution 
function ^2,3,4!- Gaussian statistics of density fluctuations [S] also underlies the Pratt-Chandlcr 
theory [§J, which employs the same information to estimate pair correlation functions for small 
hydrated hydrophobic species. 

While small solutes can be accommodated in cavities that are formed spontaneously by thermal 
fluctuations in bulk water, solvating large solutes requires forming a liquid-vapor-like interface [7j 
[8l[9]. As a result, the nature of density fluctuations in large volumes is more complex. The Lum- 
Chandler- Weeks (LCW) theory captures the lengthscale dependence of hydration quantitatively by 
combining the physics of gaussian density fluctuations and that of interface formation |5]. Specif- 
ically, it predicts that while P V (N) for large volumes is gaussian around the mean, the low-iV 
wings of the distribution are enhanced substantially [TOlITT] . Quantifying these rare water fluctua- 
tions in large volumes is essentially impossible in equilibrium molecular simulations, and requires 
non-Boltzmann or umbrella sampling methods |12) . Straightforward umbrella sampling of N, is 
further complicated by the fact that N is a discontinuous function of particle coordinates, resulting 
in impulsive forces, which are difficult to treat in typical molecular dynamics (MD) simulations. 
To circumvent this difficulty, Patel et al. recently introduced an indirect umbrella sampling (IN- 
DUS) method in which N is sampled indirectly, by biasing a coarse-grained variable, N, which 
is strongly correlated with TV but varies continuously with particle coordinates [13] . The original 
implementation of INDUS is suitable only for cuboidal volumes, and showed that for large volumes 
in bulk water, P V (N) indeed deviates significantly from gaussian behavior at low N, reflecting the 
underlying physics of interface formation 13J . 

Application of INDUS to sample density fluctuations in large volumes in interfacial environments 
showed that fluctuations near hydrophilic surfaces are similar to those in bulk, but near hydrophobic 
interfaces, the probability of density depletion is significantly enhanced [15] . The ability to calculate 
P V (N), and especially ^ ex = —k B T logP„(0), in large volumes near interfaces also allowed us to 
calculate the binding free energies of hydrophobic cuboids to surfaces with a range of chemistries 14! , 
and these binding free energies were shown to correlate with the macroscopic wetting properties of 
the surfaces. Thus, P V (N) is a potential molecular measure of hydrophobicity, which may enable 
the characterization of surfaces of proteins and biomolecules that exhibit nanoscale roughness and 
chemical heterogeneity fi"4 | ll5 1 ll6" l ll7|. 

Here, we extend INDUS such that it can be used to umbrella sample probe volumes of other 
regular shapes, e.g., with cylindrical and spherical symmetry, as well as intersections and unions of 
collections of such regular volumes and their complements. While the ideas underlying the extension 
are simple, they considerably widen the scope of the method. For example, they allow umbrella 
sampling of arbitrarily shaped volumes, enabling faithful characterization of flucuations in the 
hydration shells of ions, nanoparticles, nanotubes, and the rugged surfaces of proteins. 

We also extend the method to work in the NPT ensemble. Previous applications of INDUS were 
performed in the NVT ensemble with a buffering vapor-liquid interface. While the two schemes yield 
indistinguishable results at low pressures, the present extension allows access to a much broader 
range of pressures. We begin by describing the INDUS method of Ref. [13] , which is suitable for 
cuboidal probe volumes, and introduce the pertinent equations, which lays down the framework 
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for extending the method to other regular volumes. We then generalize these equations to volumes 
of more general shapes and to collections of such volumes, and describe how INDUS affects the 
calculation of system pressure. Finally, we demonstrate these generalizations by calculating P V (N) 
in various noncuboidal shapes and at high pressures. 



2 The INDUS Method 

The number of particles, N, in a specific probe volume, v, changes discontinuously as the center of 
any particle crosses the surface of v. Hence, if the biasing potential, U, were chosen to be a function 
of N, it would result in impulsive forces. Instead, we choose U to be a function of a closely related 
coarse-grained particle number, TV, that is a continuous function of the positions, {n}, of all M 
particles in the system as, 

M 

N = ^fe(rj), where (la) 

i=i 

h(ri) = / <Z>(r-ri)dr. (lb) 

J V 

The integral in Eq. lb is over the probe volume v, and the integrand is a coarse-graining function, 
#(r), which we choose to be 

<P(r) = (p(x)(f>(y)<p(z), where (2a) 
tf>(a) = k- x [e- a ' '/ 2a2 - e' a ' /2a2 }9(a c ~ \a\). (2b) 

The function (f>(a), shown in Figure 1, is a gaussian that is truncated at |a| = a c , shifted down, and 
then scaled, so as to make it continuous and normalized. The normalization constant, fc, is equal to 
\j2no 2 erf(a c /\/2(T 2 ) — 2a c exp(— a 2 /2a 2 ) and 0(a) is the Heaviside step function. As the width of 
the gaussian, a, approaches 0, the function 4>(a) approaches the Dirac delta function 5(a) and N 
approaches N. The correlation between N and TV is thus strongest when a is smallest, but if a is 
chosen to be too small, the resulting biasing forces may be too large to handle correctly in typical 
MD simulations. 
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Fig. 1 Coarse-graining function, 4>(a), as defined in Eq. 2b, for q c = 2a. 
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For a cuboidal volume v, the integral in Eq. lb can be performed independently in the x, y and z 
directions. The result is 

M r = h x (xi)h y (yi)h z (zi), where (3a) 
h x (ii) = I <p(x-Xi)dx, (3b) 

and x m i n and x max are the coordinates of the faces of v perpendicular to the x-axis. The functions 
h y (yi) and h z (zi) are defined analogously. 




^min ^max 'max 



Fig. 2 The functions h a (ai), h a (oti) and its derivative, h' a (cti), for coordinates that have (a) two (a — > a;), 
(b) one (a — > r) or (c) zero (a — > 8) boundaries. 



Fig. 2a shows the function h x {xi) (equal to 1 for x m - m < xi < x max , and otherwise), which can 
be thought of as the x contribution to h(ri); that is, h(ri) = h x (xi)h y (yi)h z (zi) and N = /i( r i)- 
Fig. 2a also shows the function h x (xi), which varies continuously across the boundary of v, unlike 
h x (xi). The coarse-graining function h x (xi) differs from h x (xi) only in the thin boundary region of 
thickness 2x c . Thus, by ensuring that N and N are strongly correlated, we are able to influence N 
indirectly by biasing N. 

For a cuboidal probe volume, the x-component of the force on particle i due to the biasing 
potential, U(N), is given by 



_ dU dU dh(Ti) dU~ ~ - \ r ,v 



where the derivative of h x {xi), obtained by differentiating Eq. 3b and shown in Fig. 2a, is 

h' x {xi) = -[0(x max - Xi) - 0(x min - Xi)}. (5) 

It follows that the biasing forces act only on particles near the boundary of v, are finite, and are 
continuous functions of particle positions. 

To obtain P V (N) using INDUS, we perform n w simulations with different biasing potentials, 
Uj(N) (j — 1, . . . ,n w ), chosen such that the range of interest of N is well sampled. During each 
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simulation, we collect rij samples of N and N, denoted by Nji and Njj (I = 1, . . . , rij), in essence, 
sampling the biased joint distribution function, P v (N, N) . We then unbias and stitch together the n w 
biased joint distribution functions by using the weighted histogram analysis method (WHAM) p~8j 
[19] . Finally, we integrate out the unbiased joint distribution function to obtain P V (N), which is 
given by 

J= i 1=1 E l= i^e ^ 

where S nm is the Kronecker delta function, and C and {cj} are normalization constants. These are 
chosen self-consistently via the standard WHAM equations, 

C-^VV — -. -, and (7a) 



e 



-Pc k 



0=1 1= 



0=1 l 



3 Extension of INDUS to noncuboidal volumes 

While several coarse-graining schemes are possible for defining N, a practically useful definition must 
satisfy the following three conditions: (i) N must be a continuous function of particle positions, (ii) 
N and N must be strongly correlated, and (iii) the calculation of N and its derivatives should be 
straightforward. The choice of the form of Eq. 2a for cuboid volumes allows h(r{) to be expressed 
as a product of independent contributions from x, y, and z coordinates (as in Eq. 3a). While this 
formulation is particularly convenient for cuboidal volumes, the integral (Eq. lb) that defines h(v\) 
would not be independent in the three coordinates for other regular volumes, such as spheres or 
cylindrical shells. Thus, calculating h{y\) and its gradient efficiently at every MD step would not 
be straightforward. To circumvent this complication, we bypass defining h(v\) via a coarse-graining 
function <P as in Eq. lb, and instead, define it directly as a product of independent contributions 
from the three co-ordinates (as in Eq. 3a) in the relevant co-ordinate system (e.g., cylindrical, 
spherical, etc.) as, 

Mri) = IlM a ' ; )- ( 8 ) 

Oi 

Here a represents the coordinates component index (x, y or z in Cartesian coordinates; r, 9 or z 
for cylindrical ones, etc.) and h a (a,i) may be defined in a manner analogous to h x (xi) (Eq. 3b and 
Fig. 2a). 

However, unlike cuboidal volumes, where each coordinate component has two boundaries (e.g., 
x m i n and x max ), the components in spherical or cylindrical systems may have either one boundary 
(e.g., the r coordinate for a spherical v), or no boundaries (e.g., the 9 coordinate for a cylindrical 
v). These cases are illustrated in Fig. 2 and the expressions for h a (cxi) and h' a (oti) in each case are 
as follows: 
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- Two boundaries: a m i n < a < a max . 



h a {cti) 



+ 



fcicrf 
fcierf 



y/2a 
V2a 



fe(« mM - a») 



1 



fc 2 (ai - amin) 



+ 0[ a c + - (a max - a min ) 



1 



a« 2 (amin "I - ^max) 



0(a c - |a max - 
<9(a c - |a* - a min |) 
, and 



h' a (ati) = -[</>(a max - Q-i) - 0(a min - aj)], 
where fci = k~ 1 y / ira 2 /2 and fc 2 = fc" 1 exp(-a;?/2(7 2 ). 
One boundary: a < a max . 



fcierf 



amax a^ 



fc2(a max - oti) 



y/2a 

+ 0(a c + a max - at), and 

No boundaries: 

h a {ai) = 1, and 



6>(a c - |a max - ai|) 



The forces are then given by 



fx,i 

dhjri) 

dxi 



h' a (on) = 0. 

dU dh(ri) 

dN dx t ' 



with 



9a 



where doii/dxi is an element of the Jacobian for the coordinate transformation. 



(9a) 
(9b) 



(10a) 
(10b) 

(11a) 
(lib) 



(12a) 
(12b) 



4 Generalization to collections of probe volumes 

The above approach can be generalized to calculate P V (N) in a probe volume v that is constructed 
from unions (va U«b) and intersections (va Hub) of regular subvolumes (va, vb) and their comple- 
ments (va 1 , v b> ) • The subvolumes need not be of the same size or shape. When v is constructed from 
subvolumes using the complement, intersection and union operations, the corresponding definition 
of h(ri) is constructed by noting that, 

U A '^ = 1-U A \ (13a) 
ft(^) = Pft(B) iand (13b) 

- h (AUB) = l _~ h (A')~ h (B')_ (13c) 
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Here, the superscript (A) indicates that the function is evaluated with respect to the boundaries 
of sub- volume va- For the special case of a probe volume v that is a union of G non-overlapping 
sub-volumes {vk} (k = 1, . . . , G), the above prescription yields, 

G 

h(n) = ~ h(k) ( r *)> where ( 14a ) 

k=l 

U k \r i ) = H~h^(a i ). (14b) 

a 

Once again, the force on particle i resulting from a biasing potential, U, is finite and continuous 
everywhere, and is given by 

dU dN , , . 

f - = ~Md^> where (15a) 

f=V^H (15b) 

k— 1 

The recipe given in Eqs. 9-12, when applied to Vk can be used to evaluate hof* and dh^/dxi in 
Eqs. 14b and 15b. 



5 INDUS in the NPT ensemble 

When calculating P V (N) using simulations in the NVT ensemble, as was done in Ref. [13 , it is 
important to have a vapor bubble or a vapor-liquid interface in the simulation box. This vapor 
bubble can be nucleated, e.g., by applying a particle excluding field far from v, and can grow or 
shrink to accommodate water molecules pushed into or out of v. The resulting effective pressure 
of the system is close to the saturation vapor pressure of the fluid. Alternatively, we can perform 
simulations in the NPT ensemble without such a bubble, as long as the forces resulting from the 
umbrella potential are included in the calculation of the system pressure, V . If v is fixed in space 
and does not move, grow or shrink as the simulation box dimensions fluctuate, then the contribution 
of the umbrella potential to V is 

8TI 1 M 

P Umb = -^ = 3^£ r ^ f r b : (16) 

1=1 

where f^mb j g ^ e umbrella force on particle i, calculated as described in the preceding sections, 
and V is the system volume. 



6 Results 

We illustrate the utility of the INDUS method by calculating P V (N) distributions for volumes 
of different shapes in bulk water. Biased MD simulations of bulk water were performed using 
the packages LAMMPS and GROMACS [2"Tll2"2"] . modified in-house to implement INDUS. For the 
parameters of the coarse-graining function 4>(a) in Eq. 2b, we used a = 0.1 A and a c = 0.2 A (NVT 
ensemble) or a c = 0.3 A (NPT ensemble). Each simulation box used contained several thousand 
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(N - (N))/y/{5N2j A/v, mil" 1 

Fig. 3 (a) logP„ as a function of (N — (N))/ y/ {5N 2 } for volumes of four different shapes: a sphere of 
radius 0.6 nm, a cube of side 0.9 nm, a cylinder of radius 0.3 nm and length 3nm, and a thin cuboid of 
dimensions 0.3 nm x 1.6 nm x 1.6 nm. (b) The ratio of /i ex to surface £1X621 SiS £L function of A/v for the 
four different sha pes. The dashed line represents the surface tension, 700, of a vapor- liquid interface of 
SPC/E water [20] . 



water molecules, modeled with the extended simple point charge water model (SPC/E) [23], and 
was periodic in all directions. 

We selected volumes of four different shapes (a sphere, a cube, a cylinder, and a cuboid; see 
Figure 3), each with an average number of water molecules, (N), between 25 and 30. For these large 
volumes, INDUS allows us to measure probabilities for rare water fluctuations that are rather small 
(P V (Q) ~ 10~ 30 ), whereas calculations using straightforward equilibrium simulations [5] provide 
accurate estimates only for much smaller volumes ((N) f=s 8 with corresponding P v (0) ~ 10 -8 ). 

Although the volumes of the shapes that we have selected are similar to each other, they are not 
identical. Therefore, to compare them, in Figure 3a, we plot P v as a function of (N-{N))/^{6N 2 ), 
where (5N 2 ) is the variance of N. Near the mean, fluctuations are gaussian for all shapes, as ex- 
pected. However, there are deviations from such gaussian behavior in the tails of P V (N). Specifically, 
the smaller a shape's surface-area to volume ratio, the fatter the low-JV tail. 

In the large lengthscale limit, interface formation governs the free energy of cavity formation. 
LCW theory [8 predicted, and subsequent simulation studies verified 24,25,26], that the grad- 
ual crossover from small to large lengthscale physics occurs around 1 nm, which is roughly the 
lengthscale of volumes selected here. Thus, we expect that shapes with smaller surface areas will 
have lower free energies of cavity formation and correspondingly fatter low N tails, as observed in 
Figure 3a. Figure 3b further confirms that the free energy is governed by the physics of interface 
formation: the ratio of /i cx to the surface area of the probe volume, A, which can be interpreted as 
an apparent surface tension for these nanoscopic objects, is approximately constant, independent 
of the shape of v. This apparent surface tension is lower than the surface tension of a vapor- liquid 
interface, consistent with results of Patel et al. [13] , 
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Fig. 4 P V (N) obtained by umbrella sampling a probe volume that spells, 'I N D U S'. The volume is 
composed of 156 cubic subvolumes of side 0.25 nm. The inset shows a superposition of five independent 
configurations, taken from an MD simulation with a strong biasing potential that empties the probe volume. 
The red spheres represent water oxygens. The letter T in the inset is 0.5 nm wide and 2.0 nm tall. 

In Figure 4, we demonstrate the generalization of INDUS by calculating P V (N) in an arbitrarily 
shaped volume that is a collection of non-overlapping sub- volumes. The volume that we have chosen 
spells, 'I N D U S', using a collection of 156 cubic sub-volumes, each with a side of 0.25 nm. 





Fig. 5 Comparing P V (N) for a cube of side 0.9 nm, obtained using simulations in the NPT ensemble 
(V = lbar) with that obtained from simulations in the NVT ensemble with a buffering vapor-liquid interface 
located far from v. 
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In Figure 5, we show that for a cube of side 0.9 nm, the P V (N) distribution calculated in the 
NPT ensemble at a pressure, V = 1 bar, is identical to that obtained in the NVT ensemble with 
a buffering vapor-liquid interface. This is expected since Vv <C k^T <C "/A, so the energetics of 
emptying v is governed almost entirely by the cost of forming an interface (Figure 3b). The effective 
pressure in the NVT system is the coexistence pressure, V* , at T — 300 K, which is close to 0.06 bar. 
Since, V*v < Vv <C k^T, simulations in the NVT ensemble are an excellent approximation to NPT 
simulations at 1 bar. 




(N - (N))/-s/(5W) V, kbar 

Fig. 6 (a) logP„ as a function of (N - (N))/ ^{SN 2 } for a cube of side 1.2 nm, calculated in the NPT 
ensemble, over a range of pressures at T = 300 K. (b) Free energy, /x ex , of the same cube as a function of 
pressure. A linear fit yields the excess volume of the cavity, i> ox « 0.67v. 

The ability to calculate P v (N) in the NPT ensemble allows us to study its pressure dependence 
systematically. In Fig. 6a, we show P V (N) distributions in a cube of side 1.2 nm over a broad range 
of pressures. For pressures of 1 kbar and higher, the Vv term is no longer negligible, and opposes 
emptying v. Correspondingly, the low-iV fat tail disappears gradually with increasing pressure. We 
also show in Fig. 6b that the free energy of hydrating the cubic cavity increases roughly linearly 
with pressure. The slope of fi ex versus V is the excess volume for solvating the cavity, and is equal 
to 0.67u for this cubic probe volume. 



7 Conclusions 

Given the importance of density fluctuations in understanding a range of solvation phenomena [30 
[27 l l2"8 l l29 l l30| . we anticipate that the INDUS method will be of broad interest. For instance, the size 
of density fluctuations at interfaces has been proposed recently as a measure of interface hydropho- 
bicity [14, 15, IrJlll7j. The extended INDUS method is capable of characterizing hydrophobicity in 
complex environments that exhibit chemical heterogeneity [TBllBTj . complex topography 32 33"l[3~Tj . 
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and confinement 31, 35, 36 ,37, 38, 39J . The ability to calculate P V (N) over a range of pressures using 
the NPT ensemble will be useful in studying the effect of pressure on biomolecular structure, and 
especially in quantifying the hydration contribution to the pressure denaturation of proteins |40j . 
Finally, quantifying P V (N) in a region surrounding a solute molecule constitutes an important con- 
tribution in the quasichemical theories of solvation [^TIH2] . Our extension of INDUS can be readily 
applied to quantify that contribution for a solute of arbitrary shape and size. 
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